training data provenance
Query use case
Do we trust the providers/origin of all training data used - using whitelist
Schemas used
Pseudo code
FUNCTION ai_system_providers_trusted_with_whitelist(AI_System_ID, Whitelist_Emails)
// Step 1: Retrieve provider UUIDs associated with the AI system
SET Provider_UUIDs = get list of providers contributing data to AI_System_ID
// Step 2: Retrieve provider email addresses
SET Provider_Emails = map provider UUIDs to their identity email addresses
// Step 3: Check if all provider emails are in the whitelist
IF Provider_Emails is a subset of Whitelist_Emails THEN
RETURN True
ELSE
RETURN False
END FUNCTION
Explanation
-
Find relevant data sources:
- Retrieve the configuration verification credential (
ConfigVcId) for the AI system. - Extract the weights verification credential (
WeightsVcId) used in training. - Ensure that the
WeightsVcIdis classified as"Weights". - Trace back to the training system that produced these weights.
- Identify the datapack used in the training process.
- Retrieve the configuration verification credential (
-
Extract the list of Data Verification Credentials (
DataVcIds) used in training from the datapack. -
Determine the providers who contributed this data:
- For each
DataVcId, check its attestations and extract provider UUIDs where the attestation type is"provided".
- For each
-
Map provider UUIDs to their email identities.
-
Check if all provider emails exist in the whitelist and return
Trueonly if every provider is trusted.
Query
ai_system_providers_trusted_with_whitelist(AiSystemId, Whitelist)link to query- link to simulator